NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

RT-BarnesHut: Accelerating Barnes-Hut Using Ray-Tracing Hardware

Nagarajan, Vani; Gangaraju, Rohan; Sundararajah, Kirshanthan; Pelenitsyn, Artem; Kulkarni, Milind (February 2025, ACM)

Full Text Available
SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest Restructuring

https://doi.org/10.1145/3689730

Dias, Adhitha; Anderson, Logan; Sundararajah, Kirshanthan; Pelenitsyn, Artem; Kulkarni, Milind (October 2024, Proceedings of the ACM on Programming Languages)

Automated code generation and performance enhancements for sparse tensor algebra have become essential in many real-world applications, such as quantum computing, physical simulations, computational chemistry, and machine learning. General sparse tensor algebra compilers are not always versatile enough to generate asymptotically optimal code for sparse tensor contractions. This paper shows how to generate asymptotically better schedules for complex sparse tensor expressions using kernel fission and fusion. We present generalized loop restructuring transformations to reduce asymptotic time complexity and memory footprint. Furthermore, we present an auto-scheduler that uses a partially ordered set (poset)-based cost model that uses both time and auxiliary memory complexities to prune the search space of schedules. In addition, we highlight the use of Satisfiability Module Theory (SMT) solvers in sparse auto-schedulers to approximate the Pareto frontier of better schedules to the smallest number of possible schedules, with user-defined constraints available at compile-time. Finally, we show that our auto-scheduler can select better-performing schedules and generate code for them. Our results show that the auto-scheduler provided schedules achieve orders-of-magnitude speedup compared to the code generated by the Tensor Algebra Compiler (TACO) for several computations on different real-world tensors.
more » « less
Full Text Available
Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals

https://doi.org/10.1145/3652605

Singhal, Vidush; Sakka, Laith; Sundararajah, Kirshanthan; Newton, Ryan; Kulkarni, Milind (June 2024, ACM Transactions on Architecture and Code Optimization)

Many applications are designed to perform traversals ontree-likedata structures. Fusing and parallelizing these traversals enhance the performance of applications. Fusing multiple traversals improves the locality of the application. The runtime of an application can be significantly reduced by extracting parallelism and utilizing multi-threading. Prior frameworks have tried to fuse and parallelize tree traversals using coarse-grained approaches, leading to missed fine-grained opportunities for improving performance. Other frameworks have successfully supported fine-grained fusion on heterogeneous tree types but fall short regarding parallelization. We introduce a new frameworkOrchardbuilt on top ofGrafter.Orchard’s novelty lies in allowing the programmer to transform tree traversal applications by automatically applyingfine-grainedfusion and extractingheterogeneousparallelism.Orchardallows the programmer to write general tree traversal applications in a simple and elegant embedded Domain-Specific Language (eDSL). We show that the combination of fine-grained fusion and heterogeneous parallelism performs better than each alone when the conditions are met.
more » « less
Full Text Available
Towards Efficient Python Interpreter for Tiered Memory Systems

Li, Yuze; Yao, Shunyu; Mobin, Jaiaid; Rafique, M. Mustafa; Nikolopoulos, Dimitrios; Sundararajah, Kirshanthan; Li, Huaicheng; Butt, Ali R (February 2024, Poster and Work-in-Progress in Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST))

Full Text Available
Towards Efficient Python Interpreter for Tiered Memory Systems

Li, Yuze; Yao, Shunyu; Mobin, Jaiaid; Rafique, M. Mustafa; Nikolopoulos, Dimitrios; Sundararajah, Kirshanthan; Li, Huaicheng; Butt, Ali R. (February 2024, Conference poster and Work-in-Progress in Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST))

Full Text Available
Towards Efficient Python Interpreter for Tiered Memory Systems

Li, Yuze; Yao, Shunyu; Mobin, Jaiaid; Rafique, M. Mustafa; Nikolopoulos, Dimitrios; Sundararajah, Kirshanthan; Li, Huaicheng; Butt, Ali R. (February 2024, Poster and Work-in-Progress in Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST))

Full Text Available
Towards Efficient Python Interpreter for Tiered Memory Systems

Li, Yuze; Yao, Shunyu; Mobin, Jaiaid; Rafique, M. Mustafa; Nikolopoulos, Dimitrios; Sundararajah, Kirshanthan; Li, Huaicheng; Butt, Ali R (February 2024, Poster and Work-in-Progress in Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST))

Full Text Available
UniRec: a unimodular-like framework for nested recursions and loops

https://doi.org/10.1145/3563333

Sundararajah, Kirshanthan; Saumya, Charitha; Kulkarni, Milind (October 2022, Proceedings of the ACM on Programming Languages)

Scheduling transformations reorder operations in a program to improve locality and/or parallelism. There are mature loop transformation frameworks such as the polyhedral model for composing and applying instance-wise scheduling transformations for loop nests.In recent years, there have been efforts to build frameworks for composing and applying scheduling transformations for nested recursion and loops, but these frameworks cannot employ the full power of transformations for loop nests since they have overly-restrictive representations. This paper describes a new framework, UniRec, that not only generalizes prior frameworks for reasoning about transformations on recursion, but also generalizes the unimodular framework, and hence unifies reasoning about perfectly-nested loops and recursion.
more » « less
Full Text Available
HyBF: A Hybrid Branch Fusion Strategy for Code Size Reduction

https://doi.org/10.1145/3578360.3580267

Rocha, Rodrigo C.; Saumya, Charitha; Sundararajah, Kirshanthan; Petoumenos, Pavlos; Kulkarni, Milind; O’Boyle, Michael F. (February 2023, ACM)

Full Text Available
SparseLNR: accelerating sparse tensor computations using loop nest restructuring

https://doi.org/10.1145/3524059.3532386

Dias, Adhitha; Sundararajah, Kirshanthan; Saumya, Charitha; Kulkarni, Milind (June 2022, International Conference on Supercomputing)

Full Text Available

« Prev Next »

Search for: All records